Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration

Poesio, Massimo; Diewald, Nils; Stührenberg, Maik; Chamberlain, Jon; Jettka, Daniel; Goecke, Daniela; Kruschwitz, Udo

doi:10.1007/978-3-642-22613-7_10

Massimo Poesio⁷,
Nils Diewald⁸,
Maik Stührenberg⁸,
Jon Chamberlain⁷,
Daniel Jettka⁸,
Daniela Goecke⁸ &
…
Udo Kruschwitz⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

856 Accesses
2 Citations
1 Altmetric

Abstract

Modern NLP systems rely either on unsupervised methods, or on data created as part of governmental initiatives such as MUC, ACE, or GALE. The data created in these efforts tend to be annotated according to task-specific schemes. The Anaphoric Bank is an attempt to create large quantities of data annotated with anaphoric information according to a general purpose and linguistically motivated scheme. We do this by pooling smaller amounts of data annotated according to rich schemes that are by and large compatible, and by taking advantage of Web collaboration. In this chapter we discuss the markup infrastructure that underpins the two modalities of Web collaboration in the project: expert annotation and game-based annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
Article Google Scholar
Bird, S., Liberman, M.: Annotation graphs as a framework for multidimensional linguistic data analysis. In: Proceedings of the Workshop ”Towards Standards and Tools for Discourse Tagging”, Association for Computational Linguistics, pp. 1–10 (1999), http://xxx.lanl.gov/abs/cs.CL/9907003
Broeder, D., Kemps-Snijders, M., Uytvanck, D.V., Windhouwer, M., Withers, P., Wittenburg, P., Zinn, C.: A data category registry- and component-based metadata framework. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), Valletta, Malta, pp. 43–47 (2010)
Google Scholar
Chamberlain, J., Poesio, M., Kruschwitz, U.: Phrase Detectives: A Web-based collaborative annotation game. In: iSemantics (2008)
Google Scholar
Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P.C. (eds.) Thinking: Readings in Cognitive Science, pp. 411–420. Cambridge University Press, Cambridge (1977)
Google Scholar
DCMI Usage Board, DCMI Metadata Terms. DCMI Recommendation, Dublin Core Metadata Initiative (2006), http://dublincore.org/documents/dcmi-terms/
van Deemter, K., Kibble, R.: On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4), 629–637 (2000)
Article Google Scholar
Diewald, N.: Serengeti – A brief Starting Guide. Technical manual (2008), http://www.text-technology.de/publications/serengeti_guide.pdf
Diewald, N., Stührenberg, M., Garbar, A., Goecke, D.: Serengeti – Webbasierte Annotation semantischer Relationen. LDV Forum 23(2) (2008)
Google Scholar
Dipper, S.: XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Germany, pp. 39–50 (2005)
Google Scholar
Garrett, J.J.: Ajax: A new approach to web applications (2005), http://adaptivepath.com/ideas/essays/archives/000385.php , http://adaptivepath.com/ideas/essays/archives/000385.php
Hirschman, L.: MUC-7 coreference task definition, version 3.0. In: Chinchor, N. (ed.) Proceedings of the 7th Message Understanding Conference (1998), http://www.muc.saic.com/proceedings/muc_7_toc.html
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proc. HLT-NAACL (2006)
Google Scholar
Ide, N., Suderman, K.: GrAF: A Graph-based Format for Linguistic Annotations. In: Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics, Prague, Czech Republic, pp. 1–8 (2007)
Google Scholar
IMDI (ISLE Metadata Initiative) Metadata Elements for Session Descriptions. version 3.0.4. Reference Document, MPI, Nijmegen (2003), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf
IMDI (ISLE Metadata Initiative) Metadata Elements for Catalogue Descriptions. version 3.0.0. Tech. rep., MPI, Nijmegen (2004), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_Catalogue_3.0.0.pdf
Johnson, N.L., Rasmussen, S., Joslyn, C., Rocha, L., Smith, S., Kantor, M.: Symbiotic Intelligence: Self-Organizing Knowledge on Distributed Networks Driven by Human Interaction. In: Proceedings of the Sixth International Conference on Artificial Life. MIT Press, Cambridge (1998)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall, Englewood Cliffs (2008)
Google Scholar
Krasavina, O., Chiarcos, C.: PoCoS – Potsdam Coreference Scheme. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 156–163 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1525.pdf
Kruschwitz, U., Chamberlain, J., Poesio, M.: (Linguistic) Science Through Web Collaboration in the ANAWIKI Project. In: Proceedings of WebSci 2009, Athens (2009)
Google Scholar
Morton, T., LaCivita, J.: WordFreak: An Open Tool for Linguistic Annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 17–18 (2003)
Google Scholar
Müller, C., Strube, M.: Multi-level annotation of linguistic data with mmax2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods, English Corpus Linguistics, Peter Lang, vol. 3, pp. 197–214 (2006)
Google Scholar
Navarretta, C.: Abstract anaphora resolution in Danish. In: Dybkjaer, L., Hasida, K., Traum, D. (eds.) Proc. of the 1st SIGdial Workshop on Discourse and Dialogue, ACL, pp. 56–65 (2000)
Google Scholar
Orăsan C, PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan (2003)
Google Scholar
Poesio, M.: Discourse annotation and semantic annotation in the GNOME corpus. In: Proc. of the ACL Workshop on Discourse Annotation, Barcelona, pp. 72–79 (2004)
Google Scholar
Poesio, M.: The MATE/GNOME scheme for anaphoric annotation, revisited. In: Proceedings of SIGDIAL, Boston (2004)
Google Scholar
Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of The ACL Workshop on Frontiers in Corpus Annotation, Association for Computational Linguistics, pp. 76–83 (2005), http://acl.ldc.upenn.edu/W/W05/W05-0311.pdf
Sasaki, F., Wegener, C., Witt, A., Metzing, D., Pönninghaus, J.: Co-reference annotation and resources: A multilingual corpus of typologically diverse languages. In: Proceedings of the 3nd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain (2002)
Google Scholar
Simons, G., Bird, S.: OLAC Metadata. OLAC: Open Language Archives Community (2003), http://www.language-archives.org/OLAC/metadata.html
Siorpaes, K., Hepp, M.: Games with a purpose for the semantic web. IEEE Intelligent Systems 23(3), 50–60 (2008)
Article Google Scholar
Stührenberg, M., Goecke, D.: SGF – An integrated model for multiple annotations and its application in a linguistic domain. In: Proceedings of Balisage: The Markup Conference, Montreal, Kanada (2008), http://www.balisage.net/Proceedings/html/2008/Stuehrenberg01/Balisage2008-Stuehrenberg01.html
Stührenberg, M., Jettka, D.: A toolkit for multi-dimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec, Balisage Series on Markup Technologies (2009)
Google Scholar
Stührenberg, M., Goecke, D., Diewald, N., Cramer, I., Mehler, A.: Webbased Annotation of Anaphoric Relations and Lexical Chains. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 140–147 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1523.pdf
Thompson, H.S., McKelvie, D.: Hyperlink semantics for standoff markup of read-only documents. In: Proceedings of SGML Europe 1997: The next decade – Pushing the Envelope, Barcelona, pp. 227–229 (1997), http://www.ltg.ed.ac.uk/~ht/sgmleu97.html
Waltinger, U., Mehler, A., Stührenberg, M.: An integrated model of lexical chaining: application, resources and its format. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds) KONVENS 2008 – Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin, pp. 59–70 (2008)
Google Scholar
Witt, A., Goecke, D., Sasaki, F., Lüngen, H.: Unification of XML Documents with Concurrent Markup. Literary and Lingustic Computing 20(1), 103–116 (2005)
Article Google Scholar
Witt, A., Stührenberg, M., Goecke, D., Metzing, D.: Integrated linguistic annotation models and their application in the domain of antecedent detection. In: Mehler, A., Kühnberger, K.U., Lobin H., Lüngen, H., Storrer, A., Witt, A. (eds.) Modelling, Learning and Processing of Text Technological Data Structures, Studies in Computational Intelligence, Springer, Heidelberg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom
Massimo Poesio, Jon Chamberlain & Udo Kruschwitz
Faculty of Linguistics and Literary Studies, Bielefeld University, Universitätsstraße 25, D-33615, Bielefeld, Germany
Nils Diewald, Maik Stührenberg, Daniel Jettka & Daniela Goecke

Authors

Massimo Poesio
View author publications
You can also search for this author in PubMed Google Scholar
Nils Diewald
View author publications
You can also search for this author in PubMed Google Scholar
Maik Stührenberg
View author publications
You can also search for this author in PubMed Google Scholar
Jon Chamberlain
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Jettka
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Goecke
View author publications
You can also search for this author in PubMed Google Scholar
Udo Kruschwitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Linguistics and Literature, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Alexander Mehler
Institute of Cognitive Science, University of Osnabrück, Albrechtstr. 28, 49076, Osnabrück, Germany
Kai-Uwe Kühnberger
Angewandte Sprachwissenschaft und, Justus-Liebig-Universität Gießen, Computerlinguistik, Otto-Behaghel-Straße 10D, 35394, Gießen, Germany
Henning Lobin & Harald Lüngen &
Institut für deutsche Sprache und Literatur, Technical University Dortmund, Emil-Figge-Straße 50, 44227, Dortmund, Germany
Angelika Storrer
SFB 441 Linguistic Data Structures, Eberhard Karls Universität Tübingen, Nauklerstraße 35, 72074, Tübingen, Germany
Andreas Witt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Poesio, M. et al. (2011). Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-22613-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics